Goto

Collaborating Authors

 approach 1


Dimension-adapted Momentum Outscales SGD

Neural Information Processing Systems

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.


Assessing model error in counterfactual worlds

arXiv.org Artificial Intelligence

Counterfactual scenario modeling exercises that ask "what would happen if?" are one of the most common ways we plan for the future. Despite their ubiquity in planning and decision making, scenario projections are rarely evaluated retrospectively. Differences between projections and observations come from two sources: scenario deviation and model miscalibration. We argue the latter is most important for assessing the value of models in decision making, but requires estimating model error in counterfactual worlds. Here we present and contrast three approaches for estimating this error, and demonstrate the benefits and limitations of each in a simulation experiment. We provide recommendations for the estimation of counterfactual error and discuss the components of scenario design that are required to make scenario projections evaluable.


Enhanced Sentiment Analysis of Iranian Restaurant Reviews Utilizing Sentiment Intensity Analyzer & Fuzzy Logic

arXiv.org Artificial Intelligence

This research presents an advanced sentiment analysis framework studied on Iranian restaurant reviews, combining fuzzy logic with conventional sentiment analysis techniques to assess both sentiment polarity and intensity. A dataset of 1266 reviews, alongside corresponding star ratings, was compiled and preprocessed for analysis. Initial sentiment analysis was conducted using the Sentiment Intensity Analyzer (VADER), a rule-based tool that assigns sentiment scores across positive, negative, and neutral categories. However, a noticeable bias toward neutrality often led to an inaccurate representation of sentiment intensity. To mitigate this issue, based on a fuzzy perspective, two refinement techniques were introduced, applying square-root and fourth-root transformations to amplify positive and negative sentiment scores while maintaining neutrality. This led to three distinct methodologies: Approach 1, utilizing unaltered VADER scores; Approach 2, modifying sentiment values using the square root; and Approach 3, applying the fourth root for further refinement. A Fuzzy Inference System incorporating comprehensive fuzzy rules was then developed to process these refined scores and generate a single, continuous sentiment value for each review based on each approach. Comparative analysis, including human supervision and alignment with customer star ratings, revealed that the refined approaches significantly improved sentiment analysis by reducing neutrality bias and better capturing sentiment intensity. Despite these advancements, minor over-amplification and persistent neutrality in domain-specific cases were identified, leading us to propose several future studies to tackle these occasional barriers. The study's methodology and outcomes offer valuable insights for businesses seeking a more precise understanding of consumer sentiment, enhancing sentiment analysis across various industries.


Gemstones: A Model Suite for Multi-Faceted Scaling Laws

arXiv.org Artificial Intelligence

Our models, called the Gemstones Scaling laws are typically fit using a family of because they are loosely based on scaled-down variants models with a narrow range of frozen hyperparameter of the Gemma architecture, vary in their parameter count, choices. In this work we study scaling width/depth ratio, training tokens, learning rates, and laws using a wide range of architecture and hyperparameter cooldown schedules. By fitting scaling laws to these choices, and highlight their impact on checkpoints, we confirm that scaling law parameters and resulting prescriptions. As a primary artifact of interpretations indeed depend strongly on the selection of our research, we release the Gemstones: the most models and fitting procedure used, and we quantify the comprehensive open-source scaling law dataset degree to which these decisions impact predictions. By to date, consisting of over 4000 checkpoints from exploiting the variation among our model checkpoints, we transformers with up to 2 billion parameters; these also fit a number of unique scaling laws and analyze their models have been trained with different learning predictions to discern whether they are consistent with rates, cooldown schedules, and architectural design choices we see in industry models.


Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification

arXiv.org Artificial Intelligence

This paper explores the development of a multimodal sentiment analysis model that integrates text, audio, and visual data to enhance sentiment classification. The goal is to improve emotion detection by capturing the complex interactions between these modalities, thereby enabling more accurate and nuanced sentiment interpretation. The study evaluates three feature fusion strategies -- late stage fusion, early stage fusion, and multi-headed attention -- within a transformer-based architecture. Experiments were conducted using the CMU-MOSEI dataset, which includes synchronized text, audio, and visual inputs labeled with sentiment scores. Results show that early stage fusion significantly outperforms late stage fusion, achieving an accuracy of 71.87\%, while the multi-headed attention approach offers marginal improvement, reaching 72.39\%. The findings suggest that integrating modalities early in the process enhances sentiment classification, while attention mechanisms may have limited impact within the current framework. Future work will focus on refining feature fusion techniques, incorporating temporal data, and exploring dynamic feature weighting to further improve model performance.


Lexicographic optimization-based approaches to learning a representative model for multi-criteria sorting with non-monotonic criteria

arXiv.org Artificial Intelligence

Deriving a representative model using value function-based methods from the perspective of preference disaggregation has emerged as a prominent and growing topic in multi-criteria sorting (MCS) problems. A noteworthy observation is that many existing approaches to learning a representative model for MCS problems traditionally assume the monotonicity of criteria, which may not always align with the complexities found in real-world MCS scenarios. Consequently, this paper proposes some approaches to learning a representative model for MCS problems with non-monotonic criteria through the integration of the threshold-based value-driven sorting procedure. To do so, we first define some transformation functions to map the marginal values and category thresholds into a UTA-like functional space. Subsequently, we construct constraint sets to model non-monotonic criteria in MCS problems and develop optimization models to check and rectify the inconsistency of the decision maker's assignment example preference information. By simultaneously considering the complexity and discriminative power of the models, two distinct lexicographic optimization-based approaches are developed to derive a representative model for MCS problems with non-monotonic criteria. Eventually, we offer an illustrative example and conduct comprehensive simulation experiments to elaborate the feasibility and validity of the proposed approaches.


Chinchilla Scaling: A replication attempt

arXiv.org Artificial Intelligence

Hoffmann et al. (2022) propose three methods for estimating a compute-optimal scaling law. We attempt to replicate their third estimation procedure, which involves fitting a parametric loss function to a reconstruction of data from their plots. We find that the reported estimates are inconsistent with their first two estimation methods, fail at fitting the extracted data, and report implausibly narrow confidence intervals--intervals this narrow would require over 600,000 experiments, while they likely only ran fewer than 500. In contrast, our rederivation of the scaling law using the third approach yields results that are compatible with the findings from the first two estimation procedures described by Hoffmann et al.


Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024

arXiv.org Artificial Intelligence

This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task.


Universal Auto-encoder Framework for MIMO CSI Feedback

arXiv.org Artificial Intelligence

Existing auto-encoder (AE)-based channel state information (CSI) frameworks have focused on a specific configuration of user equipment (UE) and base station (BS), and thus the input and output sizes of the AE are fixed. However, in the real-world scenario, the input and output sizes may vary depending on the number of antennas of the BS and UE and the allocated resource block in the frequency dimension. A naive approach to support the different input and output sizes is to use multiple AE models, which is impractical for the UE due to the limited HW resources. In this paper, we propose a universal AE framework that can support different input sizes and multiple compression ratios. The proposed AE framework significantly reduces the HW complexity while providing comparable performance in terms of compression ratio-distortion trade-off compared to the naive and state-of-the-art approaches.


Text-To-KG Alignment: Comparing Current Methods on Classification Tasks

arXiv.org Artificial Intelligence

In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an external knowledge source. This has especially been the case for classification tasks, where recent work has focused on creating pipeline models that retrieve information from KGs like ConceptNet as additional context. Many of these models consist of multiple components, and although they differ in the number and nature of these parts, they all have in common that for some given text query, they attempt to identify and retrieve a relevant subgraph from the KG. Due to the noise and idiosyncrasies often found in KGs, it is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query. In this work, we try to bridge this knowledge gap by reviewing current approaches to text-to-KG alignment and evaluating them on two datasets where manually created graphs are available, providing insights into the effectiveness of current methods.